-
Notifications
You must be signed in to change notification settings - Fork 147
Major frontend refactor #313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
The initial result is benchmarked below, the memory overhead and minor performance overhead are expected to be the outcome of algorithm and data structure used in this patch:
The performance overhead is expected to be came from multiuple token stream traversal, as seen 3 times at least in lexer, preprocessor, and parser. performance benchmark (stage 0)Before
After
performance benchmark (stage 1)Before
After
|
|
There are still some improvements left to do, e.g.:
|
|
I seem to unable to reply to the latest review suggestion so I'll reply here:
Perhaps? The approach requires reorganization of token kind order, and I don't think it's necessary to just boost performance while debugging (I didn't encoutered significant performance overhead when calling it)? |
|
I'm thinking an approach to validate our compiler's expansion only compilation flag (aka
@jserv what do you think? |
Furthermore, we can adapt the preprocessor implementation from slimcc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
6 issues found across 19 files
Prompt for AI agents (all 6 issues)
Understand the root cause of the following 6 issues and fix them.
<file name="src/preprocessor.c">
<violation number="1" location="src/preprocessor.c:891">
#ifdef/#ifndef ignore #undef because they only test raw map membership. Once a macro is #undef’d it remains in the hashmap, so these directives still treat it as defined.</violation>
</file>
<file name="src/globals.c">
<violation number="1" location="src/globals.c:1674">
error_at() bounds its diagnostic line scan by src->capacity instead of src->size, so it can walk past the actual source text and read uninitialized memory when printing errors.</violation>
<violation number="2" location="src/globals.c:1700">
error() now calls printf with a %s placeholder but never passes msg, so every diagnostic goes through undefined behavior and the message is lost.</violation>
</file>
<file name=".github/workflows/main.yml">
<violation number="1" location=".github/workflows/main.yml:81">
`make out/shecc` ignores the matrix compiler, so the clang matrix leg still builds with the default host compiler instead of clang.</violation>
</file>
<file name="src/lexer.c">
<violation number="1" location="src/lexer.c:561">
Character literals reset the running column counter, so every subsequent token on that line is reported at the wrong column. Increment the column instead of overwriting it.</violation>
</file>
<file name="src/main.c">
<violation number="1" location="src/main.c:104">
`libc_token_stream` is a cached token list shared via `TOKEN_CACHE`, so linking the user tokens into its tail mutates the cache and causes future includes of `lib/c.c` to re-emit the previous translation unit. Build a fresh concatenated stream or copy the libc tokens instead of modifying the cached list.</violation>
</file>
Since this is your first cubic review, here's how it works:
- cubic automatically reviews your code and comments on bugs and improvements
- Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
- Ask questions if you need clarification on any suggestion
Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR
This patch completely separate the preprocessor functionality from scanner-less parser to indepadent units, which allows compiler to expand and parse nested function-like macro, multi-token object-like macro, and more. Furthermore, the error diagnostic is rewritten to better allow user to find out where and what lexeme causes compiler to panic.
51bd234 to
73552dd
Compare
|
#221 is related to this patch, as this patch also fixes error diagnostic context issue. We may also close it upon merged into master branch. |
This patch aims to resolve most current preprocessor issues seen in parser unit, including:
__LINE__,__FILE__.In this approach, we introduce a whole new phase dedicated to preprocessing (this practice can be seen in chibicc and other similar cc), instead of binding the preprocessing phase in few different places.
Current status
This draft is still in development, the below screenshot is comparison between result of
cpp -Eandout/shecc -E:comparison
test.c:a.c:Current major compilation workflow hasn't affected by the patch, which is expected to resolve in later commits.
Memory overhead
Memory usage increasement is expected since we now use
token_tstruct to track position, and token stream will be generated at once for preprocessor and parser to use with. Once parser finished, we expect to free all tokens.Summary by cubic
Introduces a dedicated preprocessing phase and rebuilds token handling to fix include guard behavior, add robust macro expansion, and improve error diagnostics. Adds an -E mode to output preprocessed code comparable to cpp -E.
New Features
Bug Fixes
Written for commit 73552dd. Summary will update automatically on new commits.